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Large deviation theory has provided important clues for the choice 
of importance sampling measures for Monte Carlo evaluation of ex- 
ceedance probabilities. However, Glasserman and Wang [Ann. Appl. 
Probab. 7 (1997) 731-746] have given examples in which importance 
sampling measures that are consistent with large deviations can per- 
form much worse than direct Monte Carlo. We address this prob- 
lem by using certain mixtures of exponentially twisted measures for 
importance sampling. Their asymptotic optimality is established by 
using a new class of likelihood ratio martingales and renewal theory. 



1. Introduction. Importance sampling is a powerful technique to com- 
pute the probabilities of rare events by Monte Carlo simulation. For an event 
occurring with probability 10 -4 , one expects the occurrence of 1 event in 
every 10,000 simulation runs. Therefore, to generate 100 events would re- 
quire around one million runs for direct Monte Carlo. To simulate a small 
probability P(A), importance sampling changes the measure P to Q under 
which A is no longer a rare event and evaluates P(A) by 



where L = dP/dQ is the likelihood ratio. Whereas Varp^^) = P{A) — 



Therefore, Var Q (Ll A ) can be of the order 0((P(A)) 2 ) if LI a < e = 0(P(A)), 
whereas Var P (l A ) ~ P{A) as P(A) -> 0. 
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In practice, it may be difficult to find Q such that {dP '/ '(IQ)Ia is bounded 
by e = 0(P(A)). What is needed is E Q (L 2 1 A ) be of the order 0((P(A)) 2 ), 
which is much weaker than LI a be bounded by 0(P(A)). Note that EqL = 1 
whereas Eq(LIa) = P{A) and A is not a rare event under Q. Therefore, even 
though one does not have to choose Q such that L\a is bounded by some 
small number, one has to be careful to avoid the situation where L\a is 
small with a large Q-probability but so large with a small Q-probability 
that Eq(L 2 1a) is of a larger order of magnitude than (P(A)) 2 . Glasserman 
and Wang [10] have given examples to show how easily such situations can 
arise and "how poorly seemingly optimal estimators can perform" when one 
does not pay attention to avoid such situations. Their paper also gives a 
brief review of previous work on the choice of Q based on large deviation 
theory to evaluate exceedance probabilities of random walks, and provides 
examples for two types of exceedance probabilities which we describe in 
greater generality below. 

Let £, £i,£2) . . . be i.i.d. d-dimensional random vectors with common dis- 
tribution F such that tp(6) := \og(Ee 9 '^) < oo for ||0|| < #0. Let S n = £1 + 

h£n, Mo = E£,, @ = {9: 'ip(9) < 00}, and let A be the closure of Vip(Q) and 

A° be its interior. Here and in the sequel we use Vt/> to denote the gradient 
vector and V 2 ^ the Hessian matrix of second partial derivatives of tp. Then 
V?/> is a diffeomorphism from 0° onto A°. Letting 9 n = (Vt/>) -1 (/i), define 

(1.3) tQi) = sup^'/i - = 0> - VW, 

see 

which is called the rate function in the theory of large deviations. We can em- 
bed F in the exponential family {F e , 9 G 9} with dF e (x) = e B ' x ~^ dF(x). 
Letting g : A — > R, we consider in Section 2 the exceedance probabilities 

(1.4) Pc = P\ max ng(S n /n)>c\, 

[no<n<ni J 

(1.5) p n = P{g(S n /n) > b} with b > g(fi )- 

Let Q n (or P n ) denote the restriction of Q (or P) to the cr-field T n generated 
by £l> • • ■ ,£n) and let P^ )n denote the joint distribution of i.i.d. £1, • • -,£n with 
common distribution Fq and having mean [i. For a stopping time T, we also 
denote the restriction of Q (or P, P^) to the stopped cr-field Tt by Qt (or 
Pt-> Pij.,t)- In the special case d = 1 and g{x) = x 2 of (1.5) considered by 
Glasserman and Wang [10], 

Pn = P{\S n \/n >Vb} = P{\S n \ > an}, 

where a = \fb > \fio\ and a G A°. By large deviation theory, n" 1 logP{5„ > 
an} — > —4>{a) and n~ 1 \ogP{S n < —an} — > — c/>(— a). Suppose c/>(a) < c/>(— a). 
Then p n ~ P{S n > an} and 

1 P 

(1.6) n~ log L n -^ —<p(a) = lim log P{ \ S n \ > an} , 
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where L n = dP n j dP a ^ n . Therefore choosing Q n = P a ^ n as the importance 
sampling measure in (1.1) for Monte Carlo computation of P{S n > an} is 
"consistent with large deviations," in the terminology of Glasserman and 
Wang ([10], page 734), whose Theorem 2 also shows, however, that 

(1.7) hm o E Qn (L 2 n l {]Snl > an} ) = oo if0 a + 0_ a >O. 

Since Var P (l { | 5n |> an} ) ~ P{\S n \ > an} = e -{^)+°(i)}« ) (1.7) implies that 
using the importance sampling measure Q n = P ayTl performs much worse 
than direct Monte Carlo. 

Noting that A has two "minimum rate points" ±a, Glasserman and Wang 
[10] point out that the preceding difficulty with importance sampling disap- 
pears if one uses a mixture Q n = pP a , n + (1 — p)P-a,n over the minimum rate 
points (0 < p < 1), following an earlier suggestion of Sadowsky and Bucklew 
[17] who have shown that these mixture-type importance sampling measures 
are "asymptotically efficient" in the sense that 

(1.8) E Qn {L 2 n l { \ Sn \> an} ) = e -^(a)+o(i)}n _ 

In Section 2 we give a considerably more precise definition of asymptotic 
optimality, replacing the right-hand side of (1.8) by 0(\/™Pn) which we show 
to be the asymptotically minimal order of the left-hand side over reasonable 
choices of Q n . More importantly, we provide a much more general way for 
constructing the asymptotically efficient importance sampling distribution 
than taking a mixture of P^ jn over the set of minimum rate points p, which 
Sadowsky and Bucklew [17] assume to be a finite set, for general functions 
9 in (1.5). 

Glasserman and Wang [10] also consider (1.4) for the special case d = 2 
and g(p) = max(/ii, P2), using Xj to denote the jth component of a vector x. 
They assume that E£ij < for j = 1, 2. Setting no = 1 and letting n\ — ► 00, 
this special case of (1.4) reduces to 

p c = P{max(5 nj i, Sn^) > c for some n > 1} = P{T C < 00} ~ P{t^ < 00} 

if 71 < 72, where 71 and 72 are the positive solutions of ^(71, 0) = = V>(0, 72) 
and Tc = inf {n : S n j >c},T c = min(ri 1 ^ , ri^ ) . In fact, by Cramer's theorem 

(cf. [9], page 378), P{t<t < 00} ~ Aje -1 ^ (in which Aj is a positive constant 
not depending on c). Glasserman and Wang ([10], Proposition 2) have shown 
that choosing Q to be the measure under which £1,^2, ■■■ are i.i.d. with 
common distribution -F( 7li o) f° r Monte Carlo computation of P{T C < 00} is 
"consistent with large deviations," in the sense that 

(1.9) e yiC LT c has a nondegenerate limiting distribution as 00. 
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However, they have also shown that if min{#i : ip(8i, 62) = for some 62} > 
—71, then 

(1.10) Km E Q (L 2 T l {Tc<oo} ) = 00, 

and therefore this choice of the importance sampling measure Q gives much 
larger standard error than the direct Monte Carlo estimate of P{T C < 00}, 
for which Ep{ 

1 {T c <oo}) = P i T c < °°} ~ Mer^ c - In Section 2 we resolve this 
difficulty with importance sampling based on large deviation tilting by using 
a mixture of the form 

(1-11) Qt c Atii=J P t i,T c /\ni"Wc{fJ') dfJ, 

for Monte Carlo evaluation of the general boundary crossing probability 
(1.4). We provide an explicit formula for w c (fi) and make use of Theorem 
1 of [4] to show that this choice of Q is asymptotically optimal in the sense 
that Eq(Lt c 1-{t c <oo}) attains the asymptotically minimal order of p\. 

Section 3 generalizes the methods and results of Section 2 to the case 
where S n is a Markov random walk, in which £ n has distribution F(-\X n , X n ^\) 
depending on a Markov chain {Xt}. Whereas the methods and results of 
methods and results of [4] for asymptotic approximations of (1.4) and (1.5) 
when the increments £j of S n are i.i.d. provide basic tools for the derivation 
of the asymptotically optimal importance sampling measure Q in Section 
2, the extension to Markov random walks in Section 3 requires new prob- 
abilistic ideas. One important idea, given in Section 3.1, is a modification 
of the usual likelihood ratio martingale to circumvent difficulties with the 
analysis of eigenfunctions in the Ney-Nummelin [15] formula for likelihood 
ratios. Section 3.2 develops a new renewal-theoretic approach to the analysis 
of i.i.d. blocks between regeneration times introduced by Ney and Nummelin 
[15] for Markov random walks satisfying their minorization condition. Com- 
bining these new tools with the results and methods in [5] for the analysis 
of boundary crossing probabilities, Section 3.3 generalizes (1.11) to Markov 
random walks. Further refinements of these ideas are used in Section 3.4 for 
the exceedance probability (1.5). 

The complexity due to Markov dependence and nonlinearity in multidi- 
mensional settings causes not only analytic difficulties that we resolve in 
Sections 2 and 3 but also implementation difficulties as the asymptotically 
optimal importance sampling measure developed in these sections may be 
difficult to sample directly from. In Section 4 we describe numerical methods 
to address certain implementation issues and provide numerical examples to 
illustrate the effectiveness of the methods. 
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2. Asymptotically optimal importance sampling measure for Monte Carlo 
evaluation of exceedance probabilities. In this section, we derive asymptot- 
ically optimal importance sampling measures Q* and Q* for Monte Carlo 
evaluation of the boundary crossing probability (1.4) and the tail probability 
(1.5). In particular, it gives an explicit formula (2.1) for a mixing density 
w c (n) in (1.11) that yields Q*. The measure Q* involves a similar mixing 
density w n (/i) given by (2.13). 

2.1. Boundary crossing probabilities. Let T c = inf{n > no :ng(S n /n) > 
c}. Then (1.4) can be written as p c = P{T C < n\}. To derive an asymptoti- 
cally optimal importance sampling measure Q* for Monte Carlo evaluation 
of Pc, we assume the following regularity conditions (Al)-(A5) on g that 
have been used by Chan and Lai [4] to develop large deviation approxima- 
tions to p c . Define the rate function cf> by (1.3) and let dA be the boundary 
of A, | • | denote the determinant of a square matrix, E(/i) = V 2 i{j(6 fl ), and 
TM(/j,) be the tangent space and TM 1 ^) the normal space of a manifold 
M at (i. 

(Al) There exist < 5 < a < oo and < Eq < a -1 such that no ~ 5c, 
n\ ~ ac and 



(A2) M £ := {n:a~ l - e < g(fi) < 6 _1 + e and g{p) / 4>{p) = r] is a 
(/-dimensional oriented manifold for all < e < Eq, where q < d. 

(A3) liminf^aA <t>{^) > (5r)~ x and there exists e± > such that </>(u) > 
(Sr^ + E! if g(n)>6- 1 + e . 

(A4) g is twice continuously differentiable and cr({fx : g(/j.) = <5 _1 and g(fi) / 
(j>(fj,) = r}) = 0, where a is the volume element measure of M £o . 

(A5) inf^gMo |V5_p(u)| > with p = cfi - g/r, where V5_p(u) = 
(n^)'V 2 /o(//)ri^- and LI^ denotes the d x {d — q) matrix whose column vec- 
tors form an orthonormal basis of T-Mq (") in the case d> q, and we set 
|VipOi)| = lif d = ?. 

Chan and Lai [4] have given a number of important statistical applications 
in which (Al)-(A5) are satisfied. In particular, if g = (f>, then (Al)~(A5) hold 
with r = 1, q = d and M £ = {u : a -1 — e < g(fi) < 5 _1 +e}. The linear function 
g(fi) = r[9'^ ^ — ip(0[i o )] also satisfies (Al)-(A5) with M e = {no} and q = 
if a -1 < g(no) < 5' 1 , but violates (A4) if g(fj lQ ) = 5' 1 . Under (Al)-(A5), let 
A* = {/i G A : < (<5r) _1 + e\ and 5~ x + e > g{n) > a" 1 - £o} and define 



sup 

eo<g(Ai)<'5~ 1 +eo 



sO")M") = r < oo. 



a 



-1 



(2.1) 




WMXir)- 1 +ei/2}} 



{M6A*} 
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where (3 C is a normalizing constant such that J A w c (/i) dfi = 1. With this 
choice of w c , define Q* by the right-hand side of (1.11). The importance 
sampling method to evaluate p c by Monte Carlo involves generating m in- 
dependent samples (£j , . . . , ^tIaui )? i = 1, - - - , m, from Q* so that 

(2-2) Pc = m- 1 E4 i) l {T «< ni} 

provides an unbiased estimate of p c , where 

1 d Q*c /t(i) \ 



(2.3) 

^L-Si l) A -(T c W An.iW6L) , \ , 

g M T c An x V c HJ U) C (H) dfj,. 

(i) 

In Section 4, we give details about how to draw the Q from the mixture 
distribution Q*. 

To explain the motivation underlying the definition of w c (fi), we begin 
by considering importance sampling to evaluate P{S n /n E A n } for a closed 
bounded set A n such that E£i ^ A n . An asymptotically optimal importance 
density is one that is proportional to e — l/^ gj 4 \ . This suggests that to 
simulate the probability of the event 

m 

{T c < n\) = {ng(S n /n) > c for some no < n < n{\ = [J {ng(S n /n) > c}, 

n=riQ 

it may be optimal to choose an importance density that is proportional to 

sup e- n ^h {ngM > c} 

no<n<ni 

= e -[c/9(p)]<t>(p)i i + e _no ^l/ r w / \ 

e A { c / n O>s(W>c/ni} ^ e A {s(W> c /"o} j 

in which the supremum on the left-hand side is taken over all real numbers 
lying between no and n\. The formula (2.1) modifies this slightly to facilitate 
the proof of asymptotic optimality. 

We call an importance sampling measure Q c asymptotically optimal for 
evaluating p c if 

2 



(2-4) E Qc 



( dP TcAni y 

V dQ c J 



{T c <m} 



0( P 2 C 



As will be shown in Section 4, there is considerable flexibility in the choice of 
an asymptotically optimal mixing density. Since EQ c [{dPT c /\ ni / '^Qc)l{T c <ni}] 
p c , the left-hand side of (2.4) is >p 2 c by the Cauchy-Schwarz inequality, so 
the right-hand side of (2.4) indeed gives an asymptotically minimal order to 
justify the "asymptotic optimality" of (2.4). The following theorem estab- 
lishes the asymptotic optimality of Q* defined by (1.11) and (2.1). 
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Theorem 1. Assume (Al)-(A5) and define Q* c by (1.11) and (2.1). 
Then Q* satisfies (2.4) and is therefore an asymptotically optimal impor- 
tance sampling measure. 

PROOF. By considering g/r and c/r, we can assume without loss of 
generality that r = 1. We first assume also that F is nonlattice so that 
Theorem 1 of [4] can be applied, yielding 

(2.5) Pc ~ C'c q/2 e~ c 

for some C > 0. By (2.1) and (A3), 

0' 1 = [ K(/x)//? c ] dfi 



(2.6) < 5 d ' 2 / e -n m dfl 

J0{h)>8- 1 +s 1 /2 

+ {a' 1 - e y d/2 e~ c f e -<vOO/iM dfl 

J a -1 — eo<g(/i)<5 -1 +eo 

Making use of (A5) and arguments similar to those in the proofs of Theorems 
1 and 2 of [4], it can be shown that 

(2.7) f e -^)/»W^ = 0(c-( d -* 2 ), 

e -nom dfi = 0{n- {d - l)/2 e~ n ^ 1+£ ^) 

0(/*)>5- 1 +ei/2 
2.8 

= 0(c" (d ~ 1)/2 e _c(1+fel/3) ). 
Combining (2.7) and (2.8) with (2.6) yields 

(2.9) [3~ l = 0(c^- d ^ 2 e~ c ) as c -»■ oo. 

Let £?(c; ju) = {/i : — /2|| < c -1 / 2 }. Recalling that no ~ 5c and n\ ~ ac, we 
show in the next paragraph that as c— > oo, 

(2.10) ft/ f f e T Kfi-W»)] Wc M dfi ) = (c d / 2 ) 

uniformly in no < T < n\ and Tg(p) > c. Let £ n = S n /n. From (2.9) and 

(2.10) , it follows that 

(2.11) ^e T ^T c -W»)) Wc ^ )d ^ 2 l {Tc < ni} = 0(cie- 2c ), 

recalling that T c g{^T c ) > c. In view of (2.3), the desired conclusion (2.4) for 
Q* c follows from (2.5) and (2.11). 
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To prove (2.10), first consider the case inf^g^.^) > 5 1 +s±/2. Then 
for T > no, 



e T[^A-V(0,)] [w c (fi) /&] dfl > S d / 2 / e TKt-m<)]-T<PM dfi 

B(c;p,) JB(c;fi) 



§ d/2 f e T^(A-M) d/i) 
JB(c;u) 



lB{c;p.) 

so (2.10) holds. The complementary case inf ^BCqfi) <Km) — + £ i/2 im- 
plies that there exists ^4 > such that < A, uniformly in c > 1. Since 



sup 

m\<A 



sup <^(/-i) — inf (f>([J>) 

p&B{c;fL) M6B(c;A) 



<£i/2 



for all large c, it suffices to consider the case sup^g^.^ 0(/i) < +£i- 
In this case, for T < n\ and Tg{fl) > c with c sufficiently large, <?(//) > 
c/ni > a -1 + o(l), so /x G A* for all fi S B(c;/2). Therefore, letting ( = 
M fieA .\g{p)]- d / 2 , 

K(//)//3 c ]exp{r[^/i-^(^)]}d/i 

B(c;A) 

JB(c;A) 

= C/ exp{T0' (A — A») 

+ [t - c/ g (MM + c[i/«/(A) - V<?(m)]<Km)} ^ 

> Ce""/ 2 Vol(B(c; /I) D {// : (u - p)'V/G2) > 0}), 

where f(ji) = (T/c)^(/2 - u) + [l/ 5 (/2) - so that /(£) = 0, and 

Taylor's theorem yields r] > such that /(//) > (// — ju)'V/(ju) — — /i|| 2 /2 
for all € B(c;f2) and large c. It then follows that (2.10) also holds when 
sup^es^) < 5" 1 + ei, noting that ( > (5" 1 + ei)~ d/2 by (Al), with 
r = 1, in this case. 

When i 7 is lattice, the preceding arguments can still be used with some 
minor modifications. In particular, the asymptotic formula (2.5) can be re- 
placed by the weaker result 

(2.12) < liminfp c /{c 9/2 e" c } < limsupp c /{c 9/2 e- c } < oo 



in the lattice case, which suffices to yield (2.4) for Q* from (2.3) and (2.10); 
see the remark following the proof of Theorem 2. □ 
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2.2. Tail probabilities of g(S n /n) . Define 

(2.13) w n ( t ,)=p n e- n ^h {g(ll) > by , /i€A, 

where eft is the rate function given in (1.3) and (3 n is a normalizing constant 
such that J A w n (fj,) dfi = l. Let 

(2.14) Q* n = f P^wMdii. 

J A 

We propose to use Q* as the importance sampling measure from which 
i • ■ • i £n ) i i = I, ■ ■ ■ ,rn, are generated so that 



(2.15) p n = m- 1 J^L%h 



i=i 



{9(S^/n)>b} 



(i) (i) (i) 

provides a Monte Carlo estimate of p n , where <Sn = + • • • + £n and 

Li' dP n JA 

Note that p n is an unbiased estimate of p n with 

Var(i? n ) = m^Varg* (L®1 p ) 

( 2J6 ) 2 2 

= i E Q*n( L n 1 {g(S n /n)>b}) -Pn]/ m - 

We call an importance sampling measure Q n asymptotically optimal for 
evaluating the tail probability (1.5) if 



(2-17) E Qn 



dP n 

dQ n 



1 {9(Sn/n)>6} 



Under certain regularity conditions, the following theorem shows that Q* n is 
asymptotically optimal. These regularity conditions are the same as those 
in Theorem 2 of [4] on large deviation approximations to P{g(S n /n) > b}, 
which we restate below using the same notation: 

(Bl) g is continuous on A° and inf {</>(//) :g((i) > b} = b/r for some r > 0. 

(B2) g is twice continuously differentiable on {/i G A° : 6 — eo < g(/i) < 
6 + eo} for some eo > 0. 

(B3) Vg(fi) + on N := {fi £ A° : #(//) = 6}, and M := {/i £ A" : #(//) = 
6, </>(//) = 6/r} is a smooth q-dimensional manifold (possibly with boundary) 
for some < q < d — 1 . 

(B4) liminf^gA <P{fJ-) > br~ x and infg^^+a 0(//) > br~ x for every 5 > 0. 
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(B5) inf^Mln^Sr 1 ^) - sV 2 g((J,)}U fl \ > if d > q + 1, where s = 
||V^)||/||V 5 ( M )||, ei (/i) = V^)/||V0( M )||, {e 1 (f,),e 2 (^,...,e d ^)} is 
an orthoriormal basis of TM ± (fi) which is a (d— q) -dimensional linear space 
in view of (B3), and 11^ is the d X (d — q — 1) matrix (e2(/i) • • • e<i_ 9 (//)). 

Chan and Lai ([4], pages 1646-1648) have given several important statis- 
tical examples in which (B1)-(B5) are satisfied. 

Bucklew, Nitinawarat and Wierer [3] have considered an alternative to 
w n (/j,)dfi for the mixing measure in (2.14). Specifically they consider Q n = 
J Pn,ndW(n), in which unlike (2.1), W does not depend on n and the dis- 
tribution of £ and assigns all its mass to {fi : g{fj) = b}. The price for using 
these universal simulation distributions is that (2.17) has to be replaced by 
a weaker logarithmic efficiency property 

(2.18) Ef n = ple o{n) as n -» oo. 

The following theorem justifies the definition (2.17) of asymptotic optimality 
by showing that \Jnp\ is the minimal order of magnitude for the left-hand 
side of (2.17) when Q n is the joint distribution of i.i.d. £i, •• ■ , £ n with dis- 
tribution G such that 

(2.19) F(A) >0^G{A) >0 

for any Borel set A C R d , and such that A(0) := log[/ e^G^cte)] < oo for 
all \\9\\ < 0i. More generally, letting T = {0:A(0) < oo}, G7 e be the dis- 
tribution function defined by dGg(x) = exp{6'x — A(0)} dG(x) for 8 G T, 
6^ = (VA)~ 1 (/i) and W n be a distribution function on S := VA(r), it con- 
siders Q n of the form 

(2.20) Q n = [Q^ndWnM, 

where Q^n is the joint distribution of i.i.d. £i, . . . , £ n with common distri- 
bution . 

Theorem 2. Assume that g satisfies (B1)-(B5). Let G be a distribution 
function on R d satisfying (2.19) and such that f e e x dG(x) < 00 for 6 in 
some neighborhood of the origin. Define Q n from G via (2.20), where W n is 
any probability distribution on S := VA(r). Then 

(2.21) liminf^o 

Moreover, (2.17) holds for Q n = Q* n , where Q* n is defined by (2.13) and 
(2.14). 



dP n ^ 



dQ n 



-{g(S n /n)>b} 



(V^n)>0. 
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Proof. Dividing g and b by r, we assume without loss of generality that 
r = 1. To prove that (2.17) holds for Q = Q*, let B(n;p,) = {/j,: \\fi - fi\\ < 
re" 1 / 2 } be a ball of radius n" 1 / 2 centered at ft, and we shall show that there 
exists a > such that 

(2.22) / e^'^'^Wnip) dfi > an^ 2 e bn whenever g{fi) > b. 

JB(n;fi) 

Note that (2.13) yields 

(2.23) e^ e >-^wM = n e n6 '^h {gMm . 

We first assume that F is nonlattice so that we can apply Theorem 2 of [4] 
and its proof to show that for some C > 0, 

(2.24) p n ~ Cnl'- 1 )/^ 4 ", 

(2.25) ^-1= / e~ n<l> M d f j, = 0(n( q - 1 -W 2 e- bn ), 

Jg(n)>b 

and that there exists a' > for which 

/ e «^(A-M) d/i > a > n -(d+l)/2 whenever g{p) > b. 

JB(n;ii)n{n:g(jJ,)>b} 

(2.26) 

Combining (2.23) with (2.25) and (2.26) yields (2.22) for some a > 0. Let 
e» = n -1 Ei&- Then 



En* 



1 {g(M>b} 



(2.27) <£ Q * 

< cr 1 n*/ 2 e- bn P{g{S n /n) > fr} 

by (2.22), noting that E Q * n [(dP n /dQ* n )l {g{in) > b} } = P{g(£ n ) > b}. From (2.24) 
and (2.27), it follows that (2.17) holds for Q n = Q*. 

To prove that (2.21) holds for Q n of the form (2.20), we construct neigh- 
borhoods U n of M such that g(fj,) > b for /i G £7 n and 

(2.28) liminf P{5 n /n G J7 n }/p„ > 0. 

n — >oo 

Recall that e\(y), . . . , &d-q{y) form an orthonormal basis of TM _L (y) and that 
g = (j) on M. By (Bl) and (B3) with r = l, — g > on iV with equality 
attained on M. Hence, for all y G M, V(0 — g)(y) G TN~ L (y). Similarly, g is 
constant on iV and therefore Vg(y) G TN ± (y) for all y £ N. Since TiV "*"(?/) 
is of dimension 1, it then follows that for every y G M, Vg(y) is a scalar 
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multiple of e\(y) = V (p(y) / \\V (p(y)\\ . For y € M and maxi^j^^g \vi\ < n^ 1 / 2 , 
since g{y) = b and (Vg(y))' YhZi v i e i{y) = u i||V</>(y)||/s, Taylor's expansion 
yields 

(2.29) g(y + X> e *(2/)1 = b + Vl \\V <p(y)\\ / s + 0(vj) - c(v) + o(|M| 2 ), 
where v = (v2, • • • , Vd-q)' and c(v) = — v'H' y V 2 g(y)H y v /2. Let 

u n = \y + Y, v Mv) : y G M > 2n_1 - sc (^)/ll W(y)ll > 



i=l 



max Iwjl < n 1 ^ 2 

2<i<d-q 



and note that g > 6 on U n by (2.29). When m 1 5 m has a bounded continuous 
density /( m ) for some m > 1, the saddlepoint approximation 

(2.30) / (n) ( / u) = (l + o(l))(n/2^) d/2 |S(^)r 1/2 e- n ^ ) 

holds uniformly over compact sets of (i, and we can integrate (2.30) over U n 
to obtain 

(2.31) P{S n /neU n } = (l + o(l))(n/27T) d / 2 [ |E( M )|-V2 e -^) d/i> 

More generally, when i 7 is nonlattice, we can use a tilting argument and a 
local central limit theorem as in [4], pages 1651-1652, to show that (2.31) 
still holds. The integral in (2.31) can be evaluated by the same method 
as that in [4], pages 1650-1653, involving a change of variables for tubular 
neighborhoods, thereby deriving (2.28) from (2.31) and (2.24). 

Let U njfl = { y/n{x -/j):i£ U n } and apply the central limit theorem to 
conclude that 

QfM,n{S n /n £ U n } 

= Qu,n{n~ 1/2 (S n - nfj,) G U njlJ } 

(2.32) = / (27r)- d / 2 |V 2 A(^)|- 1/2 exp(-z / V 2 A(^)z/2)dz 

+ 0(n~ 1 / 2 ) 
= 0(n~ 1 / 2 ) 

uniformly in S. Let Q n be of the form (2.20). In view of (2.32), 

(2.33) Qn{S n /n E U n } = C^n" 1 / 2 ). 
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Letting 7 n = Q n {S n /n G U n } and denning the probability measure Q n {-) - 
Qn(-\S n /n G U n ), note that j n < 5n~ 1 / 2 for some 5 > by (2.33) and that 



Qn 



dPg 



{g(S n /n)>b} 



>E C 



dP n 



2 



1 



{S n /nGU n } 



= ln E Qn [(dP n ]dQ n ) 2 ] > ln{EQ n (dP n /dQ n )} 2 

= ln{ln l P(Sn/n G U n )} 2 > 5~ l yfrP 2 (S n /n G U n ). 

Therefore (2.21) follows from (2.28). 

When F is lattice, we have in place of (2.24), 

(2.34) < liminf^/^-^V 6 "} < limsup^/jn^/V 6 "} < oo, 

C s-OO c—*oo 

and hence (2.17) follows from (2.25)-(2.27). □ 

Remark. Suppose F is lattice and let Lq (of full rank d) be the minimal 
lattice of £i- In place of (2.30), we now have 

(2.35) P{S n = u} = (h + o(l))(2^n)- d / 2 |S(n/n)r 1 / 2 e- n * (,t/n) , 

uniformly over compact subsets of u/n, with u G Lq, where ho > is some 
constant depending only on Lq. By summing up (2.35) over u/n G U n , we 
obtain 

P{S n /n£U n } = (h + o(l))(2irn)- d / 2 ]T ^(u/n)]- 1 ^-^^, 

u/n£U H ,u£Lo 

which can be used to replace (2.31) in the preceding argument. 

3. Regeneration, eigenfunctions, eigenmeasures and extension of Theo- 
rem 1 to Markov random walks. Let {(X n ,S n ) : n = 0, 1, . . .} be a Markov 
additive process on X x R d with transition kernel 

P{x, AxB):=P{(X 1 ,S 1 )€Ax{B + s)\{X , S ) = (x, s)} 

= P{(X 1 ,S 1 )eAxB\(X o ,S o ) = (x,0)}, 

for any measurable subset A C X , Borel set B C R rf and s G R rf . We as- 
sume that {^n} is aperiodic and irreducible with respect to some maximal 
irreducibility measure ip. Let So = and define £ n = S n — S n -i, so that 
S n = + • • • + £ n is a Markov random walk with increments £j . We shall 
assume the minorization condition 

(3.1) P(x, AxB)> h(x, B)u(A) 

for some probability measure v and measure h(x,-) that is positive for all 
x belonging to a ^-positive set. Under (3.1) or its variant P(x,A x B) > 
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h(x)v{A x B), Ney and Nummelin [15] have shown that (X n ,S n ) admits a 
regeneration scheme with i.i.d. inter-regeneration times for an augmented 
Markov chain, which is called the "split chain." Letting r be the first time 
(> 0) to reach the atom of the split chain and assuming that 

(3.2) Q : = {(0,C) :E u e 9 ' ST ~ T<: < oo} is an open neighborhood of 0, 

they have shown that for 9 G := {0-(0,C) G fl for some £}, the kernel 
Pg(x,A) := J e e s P(x,A x ds) has a maximal simple eigenvalue e^^ e \ whe- 
re ip(9) is the unique solution of the equation E u e e St ~ t ^^) = 1 ; with corre- 
sponding eigenfunction 

(3.3) r(x;d) = E x e e ' s ^- T ^ e l 

Moreover, ip(9) is strictly convex and analytic on O and there exists a full 
set F [i.e., <p(F c ) = 0] such that 

(3.4) E x e d ' ST ~ T<: < oo for all x G F and (9, Q G ^. 

Therefore, under (3.1) and (3.2), P can be embedded in an exponential 
family 

(3.5) P e (x,dyxds) = e 9 ' s - m P(x,dyxds)r(y,e)/r(x;6), 9 G 9. 
By (3.1) and (3.5), Pg satisfies the minorization condition 

(3.6) Pg(x,AxB)>hg(x,B)v (A) where v e {dy) =[ r(y; 9)u(dy) 

J A 

and hg(x,B) = J B h(x,dz)e d ' z ~^^ /r(x;9). Let vr(^) be the stationary dis- 
tribution under Pg and denote 7r(0) simply by n. 

For the special case of i.i.d. £i, is the moment generating function 

E(e e '^) and r(-;9) = 1. Since r(x; 9) is uniformly positive and bounded under 
the uniform recurrence condition that there exist b > a > and a probability 
measure u on X x R d for which az/(A x B) < P(x,A x B) < bu(A x B), Vx G 
A', and measurable subsets ^4 and i? (cf. [11]), it is straightforward to gener- 
alize Theorem 1 to uniformly recurrent Markov additive processes. While the 
uniform recurrence assumption covers the case of finite X, it is too strong for 
applications to time series and stochastic dynamical systems. Although the 
same exponential tilting formula (3.5) still holds under the much weaker mi- 
norization condition (3.1) than uniform recurrence, r(Jr; 9) needs no longer 
be uniformly positive and bounded and its presence in the likelihood ratio 
statistic dPg : T / dPo^T makes the latter intractable. Thus, Ney and Nummelin 
[15, 16] have to restrict X n to "s-sets" on which r{X n ;9) is within certain 
bounds when they use (3.5) to analyze large deviation probabilities on S n /n. 

To circumvent the intractability of the likelihood ratio statistic, we make 
use of regeneration times and the representation (3.3) of the eigenfunction 
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to construct a modified likelihood ratio martingale in Section 3.1. We then 
bound the second moment of the likelihood ratio statistic multiplied by 
l{Tg(S' T /T)>c} by that of the modified likelihood ratio martingale, which we 
analyze by applying renewal theory to the independent blocks between re- 
generation times and using an eigenmeasure to bound each of these blocks. 
Finiteness of the eigenmeasures has been established in Section 3 of [5] un- 
der certain "drift conditions" of the type in [14], and we weaken somewhat 
these conditions in Section 3.1. To highlight the new ideas that are needed 
for Markov random walks satisfying the minorization condition (3.1), we 
consider in Section 3.2 the special case d = 1 and g(fi) = /i, with no = 1 and 
n\ = oo, and prove a general theorem (Theorem 4) that yields as corollaries 

(i) a generalization, to the Markovian setting, of Siegmund's [18] result on 
asymptotic optimality of Pe,,T c (degenerate mixture over 9) for i.i.d. £j, and 

(ii) a definitive solution of Collamore's [7] closely related problem on simulat- 
ing ruin probabilities of multidimensional Markov random walks. Theorem 4 
is also used to generalize Theorems 1 and 2 to the Markovian setting in Sec- 
tions 3.3 and 3.4, where comparison with the dynamic importance sampling 
method recently developed by Dupuis and Wang [8] is also given. 

3.1. A modified likelihood ratio martingale. Let T n be the <r-field gen- 
erated by Xo, . . . ,X n ,£i, . . . Assuming (3.1), Ney and Nummelin ([16], 
page 596) have shown how a sequence of regeneration times < r = r(l) < 
r(2) < • • • can be constructed with the following three properties: For k > 1, 

(3.7) r(k + 1) — r(k) are i.i.d. random variables; 

the random blocks {X r(fe ), . . . ,X r(fc+1 )_i,^ r(i; ) +1 , . . . ,£ T (k+l)} 

(3.8) 

are independent; 

Px{X T nA G A\T T ( k yi,i T ( k )} = v{A) for all x G X 

(3.9) 

and measurable subsets A of X . 
Moreover, for every n > 1, there exists a measure h n (x, •) such that 

(3.10) P x {t = n and (X n ,£ n ) G A x B} = u(A)h n (x,B) for all x G X, 

which is an extension of the regeneration lemma of Athreya and Ney [1] to 
Markov additive processes. 

Set t(0) = 0. Given a stopping time T, define the stopping time 

(3.11) U = ini{u >T :u = r(k) for some k > 1}. 
For (9e G, define 

e 6'Sn-nm r (x n ;6), if n<U, 
Let Q n be the smallest cr-field containing T n U cr{r(A:)l{ T (fc)< n }, k> 1}. 



(3.12) Z n (6) = { 
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Theorem 3. Z n (9) is a martingale with respect to Q n under the transi- 
tion kernel P. 

Proof. For simplicity we shall write Z n instead of Z n {9). Let W n = 
e e's nAU -(nAU)ip(8) r (x nAU ;6). Then W n is a martingale; in fact, (3.5) yields 
the likelihood ratio martingale 



(3.13) JHe^-^riX^/riXi^-e)} = e e ' s ^ n ^r(X n ;9)/r(x; 9) 
i=i 

under P. Combining (3.12) with 

w - / e e ' s "- n ^r(X n ; 6), iin<U, 
Wn -\ e e'Su-um r (x u; e), if n>U, 

and noting that Z n = W n on {U > n}, we obtain 
E[(Z n+ i - W n+ i)l{{/ >n+1 }|£/ n ] = (Z n - W n )l{ U>n+1 y = 0, 

E[{Z n +l — W n +l)l{U<n}\Sn] = (Z n ~ W n )l{(j< n y = Z n — W n , 

E[Z n+1 l {u=n+1} \G n ] = e e ' 5 "-^ (e) ^ n [e e ' a - ,/)W l {r=1} ]l { T<n, C 7>n} ) 

E[Wn + ll {U= n + l } \Gn]=e e ' S -- n ^ 

x E Xn [e*^%(X i; 9)l {T=1} ]l {T < n>u>n} . 

Since P x {t = 1 and (X^i) G A x B} = u(A)hi(x,B) by (3.10) and since 
v{X) = 1 and / r(z; 9)v(dz) = E u e e ' s -- T ^ = 1, 



^[e^-^l {r=1} ] = Je s '^h 1 (x,dz), 
E x [e°'^%(X i; 9)l {T = 1} ] = [J e^-^h^dz) 

e 9 ' z -^h x (x,dz). 



r(y;9)v(dy) 



Therefore, E[(Z n +i — W n +i)liu= n +i\\Gn] = 0- It then follows that E[(Z n +i — 

Wn+l)\Gn} = Z n -Wn. □ 

The preceding proof shows that for a given stopping time T (in particular 
the T c in Section 2.1), we first replace T by the regeneration time U im- 
mediately after T and consider the stopped likelihood ratio martingale W n 
that replaces n in (3.13) by nAU. The modified likelihood ratio martingale 
(3.12) further replaces r(X n ^jj;9) by 1 on the event {n > U}. The reason 
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why this modification helps is that it enables us to bound each of the inde- 
pendent blocks in (3.8) up to the stopping time U by some eigenmeasure of 
X. For x G X, define 



(3.14) £ X (A;6,Q=R 



Y-l 

.71=0 



and let £ u denote / £ x du{x). Then £ v {-;9, tp(9)) is the left eigenmeasure asso- 
ciated with the eigenvalue e^W; see [15, 16]. The finiteness of £ u (X;9,ip(0)) 
and £ x (X;9,ip(9)) has been studied by Chan and Lai ([5], pages 406-409) 
under certain drift-type conditions. The following lemma considers more 
generally £^(9, Q := £ h} {X\ 9, £) instead of requiring ( = ip(9), with to = x or 
v, and can be proved by the same arguments as those used to prove Theorem 
4 of [5]. 

Lemma 1. Assume (3.1) and (3.2). Let Suppose there exist 

< P < 1, a> 0, a measurable subset C of X with £ V {C;9,Q) < oo and 
£ X (C;9, C) < oo for all x G X , and a measurable function u : X — > [1, oo) such 
that: 

(Ul) E x [e e '^u{Xi)} < (1 - P)u(x) for all x<£C, 
(U2) sup 2 . gC £' :r [e £ '^ 1 ~ <: n(A'i)] < a and f u(x)u(dx) < oo. 

T/ien C) < oo and £ x (9, £) < oo /or aZZ x £ X . 

3.2. Extension of Siegmund's result on exponential tilting to Markov ran- 
dom walks. In the case of i.i.d. for which is the moment generating 
function and whose common mean is negative, Siegmund [18] considered the 
stopping times 

(3.15) T c = inf {n > 1 : 5„ > c}, T' = inf{n > 1 : 5 n < —a}, 

with < a < oo, and proposed to use the importance sampling measure 
for Monte Carlo evaluation of p c := P{T C < T'}, where 0* is the unique posi- 
tive root of tjj(9) = 0. He also showed that when Pq is used as the importance 
sampling measure, yielding the unbiased estimator 

(3.16) Pe, c :=e- es ^ +T ^ e h {Tc<T/} , 

the asymptotically optimal choice of 9 as c — ► oo is 9* because EqJpq^ J 
EqPq — > exponentially fast, for all 9^9*. Lehtonen and Nyrhinen [12, 
13] considered estimation of p c for a = oo and showed that the logarithmic 
efficiency property 

(3.17) Ef c 9 = p 2 c e o{c) as c -> oo 
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holds when the Markov additive process is uniformly recurrent. In this sec- 
tion we make use of the tools developed in Section 3.1 to extend these results 
to more general Markov random walks and provide a more precise measure 
of asymptotic efficiency; see Corollary 1. More importantly, we use these 
tools to prove the following theorem in which the stopping time T need not 
be of the form (3.15). The theorem, which will be applied in Section 3.3 
to generalize Theorem 1 to the Markovian setting, considers the more gen- 
eral d-dimensional case and involves the reciprocal R n {8X) of a modified 
likelihood ratio statistic which is similar to that in (3.12): 

(3.18) R n {6,() = e- e ' Sn+n<: . 

Let xo denote the initial state which we assume to belong to the full set F 
satisfying (3.4). 

Theorem 4. Assume (3.1) and (3.2). LetT be a stopping time and de- 
fine U by (3.11). Suppose (40, 4£) and (-20, -2C) belong to SI, 4 (40,40 + 
4(40,4C) <oo andO'E^^C. Then E e [I$(6, C)l{0'S T -T C >c}] = 0{e~ 2c ) 
as c— ► oo, where R n {8,() is defined in (3.18). 

Corollary 1. Let d = l and define T c by (3.15) and 

(3.19) pe f , c = e- e * s ^[r{xM/r{X T M)l {Tc<oo] . 

Assume that (40*, 0) and (-20*, 0) belong to SI and that £ X0 {A9^, 0) + 4(40*, 0) < 
oo. Then Eg t pg = 0(e~ 2d * c ) = 0{p 2 ) and therefore Pg„ is an asymptoti- 
cally optimal importance sampling measure. 

PROOF. Here and throughout the sequel, if the initial state {or transition 
kernel) is not specified under the expectation sign, it is assumed to be xq {or 
P). Define Z n (6) by (3.12) with T = T C in (3.11) and write Z n instead of 
Z n {6*) for simplicity. Since f{y) = y^ 1 is a convex function, {Z~ l ,Q n , n > 1} 
is a submartingale under P by Theorem 3. Moreover, since U > T c by (3.11), 

(3.20) Z Tc =e 9 * ST cr(X Tc ;9*). 
Therefore by Jensen's inequality, 

E{Z r } 1 \T c <oo,(X Tc ,S Tc ) = {x,s)} 

= e- d * s E x [e~ e * T ] > e - fl - fl (£ x [e fl * T ]r 1 = e^* s [r(x; 0*)]" 1 
= E{Z£\T C < oo, (X Tc , S To ) = (x, s)}. 

Multiplying the above conditional expectations by 1{t c <oo} an d then taking 
expectations yields 

(3-21) E( Z U 1 {T c <oo}) > E i Z Tc 1 {Tc<oo})- 
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By (3.9), X r{k) is independent of {X u . . .,X r ( fc )_i,^ 1 , . . .,C T (k)} for all k > 1, 
implying that X[/ is independent of (Su,T c ). Therefore, 

E eA z u 2l {T c <ao}) = E eS e ~ 2e * Sul {T c <oo}) 

(3.22) = E[e- rSu r{Xu- e*)l {Tc<oo y]/r(x ; 0*) 

= E ( z u ll {T c <oo})/r{x ;0„), 

noting that E[r(Xu;0)} = J r(z; 0)v{dz) = 1. Combining (3.18), (3.21) 
and (3.22) yields 

^[4(^ 1 0)lftS Tc >fl 4C }] = ^(V 1 {T c <oo})=^(^ 1 l{T c <oo})A(xo,^) 

> E(Z n l l {Tc<oo} )/r(x ;6^) = E dt (Z^ c 2 l {Tc<oo} ) 

= E 8„P8,,c/ r2 ( x o;0*), 

where the last equality follows from (3.20). In the nonlattice case, p c ~ 
Ae~ cd * for some constant A; see Theorem 3 of [5]. Without the nonlattice 
assumption, the asymptotic formula can be weakened to {A\ + o{l))e~ cd * < 
Pc < (^2 + o(l))e~ ce *. Since O^E^^^i > 0, Corollary 1 then follows from 
Theorem 4. □ 

Proof of Theorem 4. For notational simplicity, denote Ru(0,() by 
Rjj. It suffices to show that there exists a constant B such that 

E e{Ru^-{e' s t -tc>c}) 

(3.23) < e - 2c {[4 o (40,4C)^ o e- 2 ^+ 2 <] 1 / 2 A(^o;^) 

+ B[£ v (46, 4C)E u e- 2es ^ +2TC: ] 1/2 }. 

Let y k = 6'[S T{k) - S T ( k -i)] - ([r(k) - r(k - 1)] and X k = max r(fc _ 1) < n<T(fc) 
{0'[S n — SV(fc_i)] — ([n — r(k — 1)]}. By (3.7) and (3.8), the random vectors 
(y k ,\ k ) are i.i.d. for k>2. Define the renewal function 

oo 

(3.24) V ( s ) = J2Pg{yi + --- + y k - 1 <s}. 

k=2 

Since E n[6 f[S T{k) - 5 r(fc _ 1} ] = 9' {E Ae) ^){E <e) r) , E e y k + for k > 2. If 
E eUk > 0, it follows from BlackwelFs renewal theorem that there exists a 
constant a > such that V(s + 1) — V(s) < a for all s G R. We can then use 
this bound in 

Ee\e u 1{6'S T -T(>c}) 

oo 

= ^(e~ 2fel+ - +yfe) l {r(fc -l)<T<r( fc ), e '5 T -TC> C }) 
k=l 
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(3.25) <^(e- 2 ^l {Al > c} ) 



oo oo 



+ E E Ee(e- 2{s+yk h {Xk > c ^ 1} )P e { S <y 1 + ---+y k - 1 <s + l} 



<e 



k=2 s=—oo 
-2c 



noting that for k>2, Eg(e 2vk l{ Xk >c-s-i}) = E v e fi{ e 2yi 1 {Ai>c-s-i}) since 
-XWfc-l) has distribution i/g; see (3.9) with v replaced by ug. If Egy^ < 0, 
then J2T=2 Pe{s < y\ + • — V yk-i < s + 1} is also bounded by a (sufficiently 
large) for all s S R, so (3.25) still holds. Note that (3.9) basically says that 
X T is an "atom" independent of the past history {Xq, . . . , X T _i, £i, . . . , £ T }, 
and therefore in particular is independent of (yi,Ai). Since Er(X r ;9) = 
J r(z;9)u(dz) = 1, it then follows that 



(3.26) 



£ ( e 2 (*i-fi)) 

= E[e (2X ^ >r(X r ; 0)]/r(z o ; 0) = £(e 2Al "^ )/r(x ; (9); 

oo 

£ e 2 ^i^(e^l {Al > c _ s _ 1} ) 



< E 



c— 1 



e 2 *^ i0 (e-^l {Al > t} )^ 



-2y 



c-s-2 



(3.27) 



= e 4 /°° e 2t E Vefi {e- 2 y^l {Xl > t] )dt = e A E Vg Je- 2y - [ * e 2t dt 

J —oo V J— oo 

= e 4 ^ flie (e 2 ( Al -^))/2 = e 4 E u (e 2X ^)/2. 
The last equality of (3.27) follows from 

E„ # ,*(e 2(Al_Wl) ) = / ^[e 2Al " yi r(Jt r ^)/r( a ;;0)]r(x;^^( a ;), 

since dvg(x) = r(x; 0) du{x) by (3.6). By the Cauchy-Schwarz inequality and 
the definition of 4, in (3.14), 

(3.28) EUe 2Xl ~ yi ) < [EUe iXl )EUe- 2yi )] 1/2 < [4,(40, 4C)^e- 29 '^ +2 ^] 1 / 2 
for w = x and z/. From (3.25)-(3.28), (3.23) follows. □ 



In the case d > 1 , Collamore [7] considered the stopping time 
(3.29) T c = inf{n : S n £ cA} 
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as a generalization of (3.15), where A C R d and cA = {c[i : \x € ^4}. Assume 
that 

(o Qn \ A is a convex set such that dA is a smooth submanifold and E n f i ^ cA 
for all c>0. 

Then there exist unique 7^ and a £ cM such that ip(6*) = and — 
a) > for all /i£ A; see Lemma 3.2 of [7] that proposes to use (3.19), with 
9*St c replaced by 9'^St c , to estimate p c = P{T C < 00} in this multidimen- 
sional setting. Under certain regularity conditions, Collamore [7] proved the 
logarithmic efficiency property (3.17). By applying Theorem 4, we can im- 
prove (3.17) by providing a much more precise bound on Eg^p^/p 2 , thereby 
establishing the asymptotic optimality of Pg t ■ 

Corollary 2. Assume that (3.30) holds, that (49*, 0) and (-20*, 0) 
belong to 0, and that £ XQ (48* , 0) + £ u (48*,0) < 00. Then 

(3.31) E g J 2 et C = 0(e~ 2ce '* a ) = 0(p 2 c ) as 00. 

The derivation of Corollary 2 from Theorem 4 uses the same arguments 
as those used to prove Corollary 1. In particular, note that 

p c = Eg, [e- d '* S ^r(x ; 9*)/r(X Tc ;9*)] ~ Be~ e '* ac 

for some constant B in the nonlattice case, as can be shown by a modification 
of the proof of Theorem 3 of [5]. This asymptotic formula for p c can be 
weakened to (B\ + o(l))e~ e * ac < p c < (B2 + o(l))e~ e * ac in the lattice case. 

3.3. Extension of Theorem 1 to the Markov setting. Define w c by (2.1) 
and let Q*= J P ti ,T a Am w c(fJ') d\i, where _P M denotes the transition kernel Pg . 
The following theorem, whose proof is given in the Appendix, generalizes 
Theorem 1 to Markov additive processes. It shows that p c can be estimated 
efficiently by p c = L c l{ Tc < ni }, where 

1 dQ* c 

-J- — -T5 • • ■ 5 ?T c Am J 

Li c U-rT c Ani 

(3.32) = / {expK^Am - (TcArn^)]} 

J A 

x r(X TcAni ;9^)w c (n)dn/r(xo]0^), 

and (£1, . . . ,£r c Ani) is generated from Q*. Note that the set A* in (2.1) has 
a compact closure under (Al) and (A4). 

Theorem 5. Assume (A1)-(A5). 1/(49^,4^(9^)) and (-29^,-2^(9^)) 
belong to ft and £ Xo (49, 1 ,4'if;(9 fl )) + £ u (49 ^,4^(9^)) < 00 for all /i£ A*, then 
EQ*(L 2 c l{ Tc < ni }) = 0(p 2 ), where L c is defined in (3.32). 
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3.4. Extension of Theorem 2 to Markov additive processes. First con- 
sider the case d = 1 and g(n) = \i. Bucklew, Ney and Sadowsky [2] considered 
importance sampling for Monte Carlo evaluation of p n : = P{S n /n > b} for 
uniformly recurrent Markov additive processes with E n £i < b. They showed 
that for all Markov kernels Q ^ Pb satisfying P <Q, Ebp 2 n j Eqp\ q — > ex- 
ponentially fast as oo, where p niQ = (dP/dQ)^, . . . , £n)l{s„/n>ft} and 

(3.33) p n = e-^ 5 " + ^^)[r(x ;^)/r(X n ;^)]l {5ri/n > 6} . 

Their proof uses the property that r(X n ;9b) is bounded away from and 
therefore it suffices to analyze the exponential term e~ s bS n +ni>(8 b ) ^ rp^g f Q }_ 
lowing theorem, whose proof is given in the Appendix, considers more gen- 
eral Markov additive processes in which the eigenfunctions need not be uni- 
formly positive and show that Pb is still an asymptotically optimal impor- 
tance measure. It provides a more precise bound on EbP^/Pn than that 
provided by Bucklew, Ney and Sadowsky [2] . 

Theorem 6. Suppose d=l, g(fi) = fi, and define p n by (3.33). Assume 
that (—26b,—2ip(9b)) and (49b, belong to for some £ < 4ip(9b) and that 
4 (40 6 ,C) +4(4^6,0 < oo. Then E b p 2 n = 0(^rlp 2 n ). 

We next consider the general setting of Theorem 2 and extend it to 
Markov additive processes. To estimate p n := P X0 {g(S n /n) > b} by Monte 

Carlo simulations using the importance measure (2.13)-(2.14), the L { n ] m 
the estimate (2.15) is given by 



(3.34) 



L (i) dP n {Kl ' 



e 

A 



in the Markovian setting, where 9^ is the solution of Vi/j(9) = \i (see [15], 
Lemma 3.5, for existence of 0„). 

Theorem 7. Assume (B1)-(B5) , and define p n by (2.15) with given 
by (3.34). Assume that for each fi in a neighborhood of M [see (B3)] ; there 
exists < 4^(0^) such that (40^,^) and (—20^, —2ip(9^)) belong to O and 

4o(40 m ,Cm)+4(40^Cm)<oo. 

ThenE Ql pl = 0(^ip 2 n ). 



The proof of Theorem 7 is given in the Appendix. Instead of using the 
method of mixtures to construct the importance sampling measure, Dupuis 
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and Wang [8] proposed to perform importance sampling via adaptive choice 
of the tilting parameter at each step to simulate P{S n /n G A} for uni- 
formly recurrent Markov additive processes. Suppose (Xk,Sf.) = (x,s) has 
been generated. Let = {(no — s)/(n — k) : a £ A}. Their dynamic impor- 
tance sampling method chooses S A^ such that <fi(fJ<k) = inf{0(o) : a £ A^} 
and generates (-Xfc+i,£fc+i) from P# M (x, •). Under certain regularity condi- 
tions on A, they have established the logarithmic efficiency property (2.18) 
of the method. 

4. Implementation and examples. Since Q„ is a mixture of P^ n with 
mixing distribution W n that has density function (2.13) with respect to 

(i) 

Lebesgue measure, we can draw the Q from Q* n by generating m i.i.d. 
vectors (//(£)) £i? ■ • ■ , £n ) as follows: Generate /i(z) from W n and then gen- 
erate £i , . . . , £n ^ from in the i.i.d. case, and , ^ , . . . , Xn' , £^ from 
in the Markov case. These m simulated vectors are used to evaluate 
p n by Monte Carlo via (2.15). Likewise, to evaluate p c by Monte Carlo, 

we generate m independent trajectories \ ■ ■ ■ >£y(*) A ) f rom ^mW' wnere 
/i(z) is generated from the distribution with density function w c given in 
(2.1). Note that (2.1) and (2.13) involve normalizing constants j3 c and j3 n . 
Instead of using the asymptotically optimal mixture density (2.1), it is often 
more convenient to use variants thereof that also yield asymptotically opti- 
mal importance sampling measures. For example, suppose v c ([i) is a density 
function satisfying 

(4.1) inf [v c (ji)/w e (jj l )]>e>Q, 

w c (fi)>0 

and let Q v and p v denote the corresponding importance sampling measure 
and associated estimator of p c , respectively. Then Q v is also an asymptoti- 
cally optimal importance sampling measure. This property provides us with 
the flexibility of choosing an importance density v c ((ji) that does not involve 
difficult calculation of normalizing constants and such that the likelihood 
ratio 

(4.2) f e^^-CTAnO^^)^^^^.^)/^^.^)]^ 
J A 

has a closed-form expression or can be easily computed by numerical inte- 
gration. A statistical application illustrating this point is provided by Chan 
and Lai ([6], pages 266-268) whose Table 2 shows a large variance reduc- 
tion over direct Monte Carlo by using a convenient asymptotically optimal 
importance sampling measure Q v satisfying (4.1). 

Instead of using the mixture of P^ with mixing density w c in (2.1) or w n 
in (2.13), asymptotically optimal importance sampling measures can also 
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be attained by using discrete mixtures of P^ whose likelihood ratios do not 
involve numerical integration. To fix the ideas, first consider the boundary 
crossing probability p c . Defining 

d 

(4.3) K c ( f i) = Y[[ fM ,^ + C - 1 / 2 ) for/i=(/ii,...,/i d )eR d , 

i=i 

and letting A c = {/i G (c~ 1/2 Z) d : AT c (,u) nA*/ 0}, a discrete analogue of 
(2.1) is the probability mass function 

w*M = M[9^)]- d/2 e- cM/9M l {fieAc} 

(4-4) + S d ^e-^h {m>(Sr) -t +£l/2} }, fx € (C^Zf, 

where f3 c is a normalizing constant so that J2 f _i£( c - 1 /' 2 z) d w *ci^) = 1- The proof 
of Theorem 5 shows that the theorem still holds if (2.1) is replaced by the 
probability mass function (4.4). Note that for the special case d = 1 and 
g(n) = (i, Corollary 1 only involves a single Pg t for the discrete mixture. We 
next generalize this result to finite mixtures (with support independent of 
c). With r and 5 given in (Al)-(A5), let 

(4.5) J(/x) = {s G A :r%s - ^(0 M )] > min[<T \ g(s)]}. 

Corollary 3. Assume (A1)-(A5) with q = 0, no = 5c and n\ = ac 
and define J(fi) by (4.5). Suppose there exists a finite set G such that 

OieAisW^a-^cU^JW- //(4^,4^)) and (-2^,-2^)) fee- 
Zong to f2 and £ X0 (48^,Aip(6^)) + £ u (46^, 4^(0^)) < oo for every u£ G, then 
SueCJ^M-^" ^ s an asymptotically optimal importance sampling measure for 
any choice of weights u>^ such that min^c^n > and YlueG^v = 1- 

Proof. Let Q = YlueG^n^n- Since n\ = ac, g{Sx c ) > a" 1 on {T c < 
n{\. Since {fi £ A:g(fi) > a -1 } C LLeG^O-O an d since L^ c = dPT c /dQT c < 
uj~ 1 dPx c I dP^^c for every /i 6 G, it then follows that 

E Q{ L TMT c <m}] < XI £; <3t i: 'T c 1 {ST c /T c eJ((u),T c <ni}] 
MSG 

,. a , =Y1 E i L TA{ST c /T c eJM,T c < ni }} 

(4-6) „ eG 

MSG 



1 {S , T c /T c eJ(M),T c <m} 



Since T c > no = 5c and T c g(ST c /T c ) > c, it follows from (4.5) that {St c /T c € 
A/*)} c {^ 5 t c - T c ip(9fi) > c/r}. Therefore, by Theorem 4 [with C = V(0 M )] 
and the proof of Corollary 1, 

(4.7) ^[(^c/^rc) 2 l{S5r c /T c6 J0*),r c <nx}] = 0{e~ 2c ' r ) 
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for every u G G. Combining (4.6) with (4.7) yields £ , g[L^ c l| rc<ril }] = 

(3(e -2c / r ) = 0(p 2 ), in view of (2.12) with q = and with c replaced by 
the more general form c/r. □ 

Note that Corollary 1 is a special case of Corollary 3 for d= 1, g(x) = x, 
5 = 0, a = oo and G = {/i*}, where pi* = ■0'(6'*)- In this special case, since 
r = 6* 1 and ^(0*) = 0, {/j G A:g(jj,) > a" 1 } = {/u:^ > 0} C J(/i*). Finite 
mixtures of the form Q = X^eG ^V-^m are a ^ so asymptotically optimal for 
estimating p n = P{g(S n /n) > b} under conditions similar to those in Corol- 
lary 3. Motivated by Glasserman and Wang's [10] example for Monte Carlo 
evaluation of P{|,S n | > an}, the following corollary of Theorems 4 and 6 
considers more general finite mixtures of the form X^eG u fi,nPfi,n- Its proof 
is given in the Appendix. 

Corollary 4. Assume (B1)-(B5) for q = 0. Suppose there exists a fi- 
nite set G such that g(u) > b for all fi £ G, min^gc (j>(fJ,) = b/r and {/x G 
A :#(//) > b} C U/xeG > where H{n) = {s£ K:6'^{s — /i) > 0}. Assume 
also that for each u £ G, (—29 a ,—2ip(9 a )) and (46^,^) belong to $7 for 
some C M < 4:ijj(6 M ) and 4 (4f M , Cm) + C(40 M , C/x) < 00 • Then Y.^G^^^P^n 
is an asymptotically optimal importance sampling measure for any choice of 
positive weights u;^ n such that 2~2fieG u n,n = 1 anc ^ 

(4.8) liminf w„ „ l e -Mm-b/r] > Q for aU £ G 

n — >oo 

Example 1. For the tail probability P{|,S n | > an} considered by Glasser- 
man and Wang [10], {u : \ u\ > a} C H(a) UH(— a), so we can apply Corollary 
4 with G = {a, —a}. Note that their choice of the mixture weights u„ B = 
e -n#A0/[ e -n*(a) + e -t»*(-a)] = e -n*&*) /{(! + (l)) e - fen / r } satisfies (4.8) and 
that min{(j)(a),(j)(—a)} = b/r. We study the performance of this importance 
sampling measure in a more general example of a Markov additive process 
in which the underlying Markov chain {X n } n >o has state space {1, 2, 3} and 
transition matrix (p xy )x,yex such that p xx = 0.5 for every x, p\2 = P23 = 

P3i = 0.3, pis = P21 = P32 = 0.2. Letting & = X, so that S n = X\ H h X n , 

consider the Monte Carlo evaluation of P n {S n /n > 2.7 or 5 n /n < 1.5}, where 
7r is the stationary distribution with tt(1) = 7r(2) = 7r(3) = 1/3; this corre- 
sponds to Corollary 4 with g(u) = {fJ>— 2.1) 2 and \[b = 0.6. Table 1 compares 
direct Monte Carlo evaluation of this probability with two importance sam- 
pling procedures, the first using Q n = Pi, 5 n that tilts to the minimum rate 
point n = 1.5, and the second using the mixture 

Qn = W n Pi.5 n + (1 — LO n )P2.7,n 

(4.9) 

with w n = e" n ^ L5 ) /( e - n ^ L5 ) + e - ^ 2 - 7 )), 
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as advocated by Glasserman and Wang (year?). The eigenvalues and eigen- 
vectors used to define Pi.5 n and P2.1 n are given by 



r(l 
r(2 
r(3 



Cl.5 
#2.7 
01.5) 
^1.5) 
01.5) 



-0.507, 

0.815, 

1.20, 

0.88, 

0.92, 



r(l 
r(2 
r(3 



?2.7j 



?2.7j 



72.7j 



= 0.688, 
= 3.11, 
0.747, 
1.02, 
1.23. 



Moreover, 0(1.5) =0.120 and 0(2.7) =0.251 are used to evaluate u n . Each 
result in Table 1 is a Monte Carlo estimate based on m =10,000 simulation 
runs, and the standard error of the estimate is given in parentheses. Table 
1 shows that direct Monte Carlo has much larger standard error than im- 
portance sampling with (4.9) and that it is unable to provide a meaningful 
estimate when the probabilities are smaller than 10~ 4 . The minimum rate 
point method (Q n = Pi,5 >n ) has much larger standard error than (4.9) for 
n = 10 and tends to underestimate the true probability for n = 20. 



Example 2. Let £1,62, ■■■ be i.i.d. three-dimensional standard normal 
vectors and 1 < uq < n\ < 00. Consider the regime- switching Gaussian ran- 
dom walk S n = Ya=\ iii witn & = ( x i ~ 2 > °, 0)' + e i} where {X n } n > is the 
Markov chain in Example 1. Let T c = inf{n > no : H^H 2 > cn}, which corre- 
sponds to the case g(fi) = ||/i|| 2 in Theorem 5. To compute p c = P n {T c < n{\ 
via importance sampling, where tt = (1/3,1/3,1/3)' is the stationary dis- 
tribution of Xq, we use a slight modification of (4.4) to define the discrete 
mixture density function by 

^ c ( / z)=/3 c {b^)]- 3 / 2 e -^M^^l {c/ni < 3(At) < c/no} 

(4.10) 

+ (c/n ) 3 / 2 e-^^)l {c/no<3(At) < 6} }, 



Table 1 



Direct Importance sampling 



n 


Monte Carlo 


Qn — P 1 . 5,Tl 




Q n given by (4.9) 


10 


1.04(0.03) x 10" 1 


1.1(0.3) x 10" 


-1 


1.04(0.01) x 10 _1 


20 


2.0(0.1) x 10~ 2 


1.91(0.03) x 10" 


-2 


2.02(0.03) x 10~ 2 


40 


1.1(0.3) x 10~ 3 


1.25(0.02) x 10" 


-3 


1.25(0.02) x 10~ :i 


60 


2(1) x 10~ 4 


0.93(0.02) x 10" 


-4 


0.96(0.02) x 10~ 4 


80 





7.4(0.2) x 10" 


-6 


7.4(0.2) x 10~ 6 


100 





5.9(0.1) x 10" 


-7 


5.9(0.1) x 10~ 7 
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for fM £ (c 1 / 2 Z) 3 , with b > c/rao and normalizing constant f3 c such that 
E m g( c -i/2z)3 w c (fi) = 1. Note that 

p a — (E \e '^l IV x \\ - e \\e\\ 2 /2p 

where a is the first component of 9 and 

/0.5e~ a 0.3 0.2e a \ 
P a = 0.2e" a 0.5 0.3e a . 
\0.3e- a 0.2 0.5e a / 

Let A(a) be the largest log-eigenvalue of P a , with associated eigenvector 
r(-;a). Then ^(^) = A(a„) + ||^|| 2 /2, where we use superscripts to denote 
the components of the vectors \i = (/x 1 , ,u 2 , /x 3 )' and 9^ = (a M , # 2 # 3 )'. Let A 
denote the derivative of A. Since /i = Vt/j(9 fl ) = (A(a (U ), 0, 0)' + 9^, # 2 = [x 1 
and 9\ = /i 3 . Moreover, A is convex and therefore we can use the bisection 
method to solve the equation A (a) + a = fi 1 for a = a^. We first generate 
\jl from the mixture density function (4.10) and then use the measure Pq 
to generate £j = (Xi — 2, 0, 0)' + 9^ + e, so that {X n } n >o has the transition 
probabilities 

P a M (a:, y) = Pa^ (x, y)e- XM r(y; a M )/r(ac; a M ). 

Table 2 gives Monte Carlo estimates of p c for eight choices of (c,no,ni), 
based on m = 10,000 runs for each entry, in which the standard error is shown 
in parentheses. We compare direct Monte Carlo with importance sampling 
using the mixture density function (4.10) in which 6 = 7. The results show 
the effectiveness of using (4.10) to compute probabilities of order as small as 
10~ 7 , and that direct Monte Carlo becomes unreliable even for probabilities 
of order 10 -4 . Although extra time is used by importance sampling to com- 
pute the likelihood ratio Lt c , direct Monte Carlo and importance sampling 

Table 2 



Direct Importance sampling 



c 


«o 


ni 


Monte Carlo 


with weights (4.10) 


20 


5 


50 


3.0(0.2) x 10" 2 


3.19(0.05) x 10" 2 


2o 


5 


50 


9(1) x 10~ 3 


8.57(0.08) x 10~ 3 


30 


5 


50 


3.1(0.6) x 10~ 3 


2.75(0.03) x 10" 3 


35 


5 


50 


5(2) x 10~ 4 


5.58(0.07) x 10~ 4 


40 


10 


100 


4(2) x 10~ 4 


7.3(0.3) x 10~ 4 


50 


10 


100 





3.37(0.09) x 10~ 5 


60 


10 


100 





1.82(0.04) x 10~ 6 


70 


10 


100 





9.2(0.2) x 10~ 8 
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have similar simulation times because direct Monte Carlo has to generate 
X n until n = ni for most runs whereas importance sampling can stop at 
T c < n\. 



APPENDIX 



Proof of Theorem 5. Assume without loss of generality r = 1. First 
note that (2.9) is still valid. Moreover, (2.12) still holds, as can be shown by 
arguments similar to the proof of Theorem 6 of [5]. Define T c = {/i: 4>(/j,) < 
5- 1 + e 1 }n(c- 1 / 2 Z) d and A c {^*) = {T c < n x ,S T jT c G K c (jf)}, where K c (fi) 
is defined in (4.3). We next apply Theorem 4 to show that uniformly in 
fi* G T c with K c (n*) n A* / 0, 



(A.l) 



exp< c 



-2 inf d>{fi)/g(fj,) 
+ sup <j>(n)/g(n) 



Define Z n (9) as in (3.12). Let i] c ^ = J Kc ^*)W c (n) d/x and w c (fJ,) =w c (fj,)/ 
jw. Then 



(A.2) 



-i 



Since Sk c {^*) w c{^) dfi = l, Jensen's inequality yields 



z Tc (e^)w c (ii)dn) < / z T 1 {e^ L )w c {n)dn. 

Putting this in (A.2), we obtain 

JKJa*) 



(A.3) 



sup -E[z 7 ; c 1 (^)i Ac(/ ,» ) ]. 



Since the function h^{9) := — ip(0) is maximized at 9 = 9^ with maximum 
value <f)(fJ,), V/i M (# M ) = and there exists ot\ > such that hg T /t c (9h) > 
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4>(St c /T c ) — a.\c~ l if \i and St c /T c belong to K c (/j,*). Therefore on A c (fj,*) 
and for \i 6 K c {fx*), 

= T c h ST j T M Z T C 4>(S T JT C ) - Tcatc' 1 
(A.4) > C( j>{S T jT c )/g{STjT c )-a 2 
>c inf y>(p)/g(p)]-a 2 . 

In view of (A.4), application of Theorem 4 with £ = ^{0^) and Jensen's 
inequality as in the proof of Corollary 1 then shows that for \x £ K c ((j,*), 

<E[z^{e,)i M ^} 

(A.5) =r(xo;^)^ M [^ 2 (^)l Ac(At , ) ] 

= 0(e X p|-2c^mf^ ) [^)/ 9 ( / x)]|). 
From (2.1), it follows that uniformly in fi* S T c with K c (f/,*) n A* ^ 0, 
(A.6) ^-^ofcV^expjc sup [0(/i)/ 5 (//)])) . 

Combining (A. 3) with (A.5) and (A.6) yields (A.l). 

Making use of (A1)-(A5), we can use geometric integration as in [4], 
page 1651, to show that 



ex p j c 

M*er c :/< c (At*)nAV? 



-2 inf 



neK (jj*)\g(ji) 



+ sup 



0(c 



Combining this with (A.l), in which = 0(c( q ~ d " 2 e~ c ), then yields 
(A.7) Yl E Qt (Lll M ^) = 0(c"e- 2c ) = 0(p 2 c ) 

by (2.12). Moreover, the proof of Lemma 2 in [5], pages 418-419, can be 
used to show that 

(A.8) E Q ,(L 2 c l {Tc ^ nuSTc/T ^ A , } ) = o(e- 2c ). 

From (A.7) and (A.8), the desired conclusion follows. □ 
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Proof of Theorem 6. Let U be the stopping time (3.11) associated 
with the fixed time T = n. We shall make use of the i.i.d. inter-regeneration 
blocks as in the proof of Theorem 4 to show that 

(A.9) ^(e-^+^il^^,^}) = 0(n-VV 2 "^)), 

in which the additional re -1 / 2 factor that is not present in Theorem 4 is due 
to the use of a local limit bound 

(A.10) sup P b {y<e b S n . k -(n-k)^(e b )<y+l} = 0{n- l l 2 ), 

in place of BlackwelPs theorem for the renewal function (3.24). The proof 
of (A. 10) is given in the next paragraph. Making use of Theorem 3 and 
Jensen's inequality and noting that {9 b S n — nip(8 b ) > n<p(b)} = {S n /n > b}, 
it can be shown by using arguments similar to those in the proof of Corollary 
1 that 

(A.ll) E b fjr 2 (x 0] e b ) < ^(e- 2e6 ^ +2 ^^)l {e6 5 n -n^(^)>n^(6)})- 

The desired conclusion then follows from (2.34) for the case q = (see proof 
of Theorem 2 of [5]) together with (A.9) and (A.ll). 

Let £j = 8 b ^i — ip(9 b ) and Si = £i + • • • + £j. For the special case of i.i.d. 
nonlattice £j with variance a 2 > 0, Theorem 1 of [19] yields 

P b {ncj)(b) — \fnoz — I < S n < rup(b) — ^/naz] 

(A.12) 

= (2vrn ( T 2 )- 1 / 2 [e~ z ' 2/2 + o(l)] as n -> oo, 

uniformly over z £ R, so (A. 10) holds. For Markov additive processes with 
nonlattice increments and satisfying the minorization condition (3.1), Chan 
and Lai [5] have shown that 

J g{y)Pb{X n G dy,n<f)(b) - y/naz -l<S n < n<j)(b) - ^fnaz) 
= (2Kna 2 y l l 2 e~ z2 l 2 {J g(y) dir(9 b ) + o(l)| 

for any nonnegative bounded measurable function g; see their (6.11) and 
the arguments leading to their (6.12). Taking g = 1 then yields (A.12) and 
therefore also (A. 10). When the Markov additive process has lattice incre- 
ments, although one no longer has the precise formula (A.12), (A. 10) still 
holds by a modification of these equations in [5], similar to that used to 
weaken (2.24) into (2.34) for lattice random walks. 
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To prove (A. 9), let r* be the last regeneration time at or before time n. 
Then analogously to (3.25), 

b{ {Sn>n^(b)}' 

71 OO _ 

(A.i3) =E E ^^\~ Sn> _ nm} 

k=0 y=-oo 

X 1 ~ ) 

{ T *=n-k,n4>(b)-y-l<S T * <n<j>(b)-y} ' 

Let k <n. Using the decomposition Sjj = S n -k + (Su — Sn-k) on the event 
{t* = n — k}, we obtain 

Eu ( p 2[n(j>{b)-SuU „ \ 

bv {S n >n<t>(b)} {T*=n-k,n<j>(b)-y-l<S T *<n4>(b)-y} J 

(A -\A) x 1 „, p -^{Su-S n -k)-i „ \ 

V-n-J-^ ■ L {r*=n-fc,n^(fe)-y-l<S, l _ fe <n^(b)-j/} C ^S^n^fe)} ' 

<e 2 ^ +1 )^(e" 2 ^- ? "-^ 

x 1 ~ ~ ~ 

{ T *=n-k,S n -S T *>y,n<j>(b)-y-l<S T *<n<t>(b)-y} ' 

Conditioned on the event {r(i) = n — k, n(f)(b) — y — l < S T f(\ < n<f>{b) — y}, the 
vector (S T a + u — S T u\,S n — S T u\,T(i + 1) — r(i)) has the same distribution 
as (S T ,Sk,r) that is initialized at the regeneration distribution under Pf,. 
Hence by (A. 10), for k < n 1 / 2 , 

R, (p-2(Su-S n -k)-t „ „ „ ^ 

6V {T*=n-k,S n -S T *>y,n<f>{b)-y-l<S T *<n<f>(b)-y}' 

= V" _E , fe ( e ~ 2 ( s '^(i+i)~ 5 ^(i)) 
i=0 

X 1 ~ ~ ~ ) 

{r(i)=n— k, r(i+l)-r(i)>k, Sn — fiy(i)> ?/, n<f>(b)— y— l<S T ^)<nij>{b)— y} 1 

= E Vb .b{ e 2STl {S k >y,r>k}> 

oo 

(A.15) x £ P h {r(i) = n-k, ncj>(b) -y-l< S r(i) < ncf>(b) - y} 

i=0 

<E Ub , b (e-^l { ^ y>T>k} ) 

x Pb{r{i) =n — k for some i, 

n<j>(b) -y-l< S n -k < ncf)(b) - y} 
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Moreover, 



H. P. CHAN AND T. L. LAI 
0(n-^)E^ b (e-^l { ^ yT>k} ). 



E E K b de 2{v+1) - 2S ^ 
o<k<n 1 / 2 y=-°° 



{S k >y} 



1 {r>fe}) 



(A.16) 



<[e 2 /{l-e- 2 )]Y,E Vb ^ 2Sk - 2ST Hr>u } 

k=0 



[e 2 /(l - e~ 2 )]E Vb , b 



r-l 

E e2 

Lfc=0 



[e 2 /(l - e" 2 )]^ 



r-l 

E^ 

k=0 



2Sh—S T 



in which the last equality can be shown by using the same arguments as 
(3.27). By (A.14)-(A.16), 



E E E b (e 2 ^~ s vh 

0<k<n 1 / 2 V=~ 00 



{S n >n<f>(b)} 



(A.17) 



x ^-{ T *=n-k,n4>(b)-y-l<S T *<n<j>{b)-y}) 
/r-l _ „ \ 



= 0(n- l l 2 )E v e 2Sk ~ ST = 0{n- 1 ' 2 ), 

\k=0 ) 

where the last relation follows from Lemma 2 below. 

Let n 1 / 2 < k <n and A = [<±tp(6 b ) - (]/2. The bound in (A. 15) can 
modified to 



E b (e~ 2( - Su ~ Sn ~ k) l 



{T*=n-k,S n -S T *>y,n<j>(b)-y-l<S r *<n(j>(b)-y}> 



<E Vh , b (e 2STl {Sk >y )T >k}< 



and therefore by (A. 14), we can modify (A.16) and (A.17) to 

[ {S n >n<f>{b)} 



E E E b {e 2 ^- S ^l { 

n 1 /2<fc< n y=-oo 



X 1 



{T*=n-k,n<j>(b)-y-l<S T *<TUp(b)-y}> 



(A.18) <e~ nl/2x J2 ^ E Eb( 



,2[n<t>(b)Su}- 



n L / 2 <k<n V=-oo 



L {5 n >n9i(b)} 
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{ T *=n-k,n<j>(b)-y-X<S T * <nij>(b)-y} 



fr-1 



< [e- nl/2x+2 /(l - e~ 2 )}E u e 2S ^ +kX = 0( e - x ^ 



\k=0 



since E u (Y^ = q e<2Sk ST+kX ) < oo by Lemma 2 below. Finally, for the case 
k = n, by the Cauchy-Schwarz inequality, 

\S n >nMb)\ {r*=0,n 



y=-oo 



{S n >n</}(b)} {T*=0,n<t>(b)-y-l<S <n<j>(b)-y}> 



(A.19) = £ 6 (e 2M(6) -^\s n > n , (6) , r>n} ) < e- A ^(e 2 ^-^^l {T>n} ) 

= e- nX E(e 2 ^-^ +nX l {T>n} )/r(x ; b ) = 0(e~ nX ), 

where the last relation follows from Lemma 2. From (A. 13) and (A. 17)- 
(A.19), (A.9) follows. □ 

Lemma 2. With the same notation and assumptions as in Theorem 6, 
let & = Oi£i - i/>(0 b ),Si = £ + •••+& and\= [^(9 b ) - (}/2. Then 



fT-X 



't-X 



E v Y, e2Sk ~ Sr + E » £ e 2S ^ +kX < oo, 

\fc=0 / Vfc=0 / 

E(e 2 ^ +nX l {T>n} ) = 0(l), 
where r is the regeneration time as in (A.15)-(A.19). 

PROOF. Let W k = e 2Sk+kX li T>k x and Y = e~ Sr . Then 



't-X 



M£ e 

\fc=0 



2S k -S T +k\ 



Ejj2W k Y)=E u (Y W k Yl {T>k} 



Kk=0 



Kk=0 



< 



\k=0 / 



= {e i/ (46 b ,C)+E„(TY 2 )}/2. 
Since {-20 b ,2ip{0 b )) G Q and is open, E u (tY 2 ) < oo. Therefore 

EJj2e 2 ^ +kX ) <oo. 
\fc=0 / 

Since A > 0, this implies that E u (J2 k Zo e 2Sk ~ Sr ) is also finite. By the Cauchy- 
Schwarz inequality, 

E(e 2 ^-^ +nX l {T>n} ) < [Ee^+ 2nX l {T>n} ] 1/2 [Ee- 2 ^] 1/2 = 0(1), 
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since Ee~ 2 ^ = EY 2 < oo and since E(e 4S " +2nX l {r>n} ) = E(e 4(>bSn - nC -l {T>n} ) < 
4 (4^,C)<oo. □ 

PROOF of Theorem 7. The first step is to generalize Theorem 2 of 
[4] to the Markov case. This can be done by combining the basic ideas 
of the proof of that theorem with those of the proof of Theorem 3 of [5]. 
Assuming r = 1 without loss of generality, we can use these arguments to 
show that (2.24) still holds in the nonlattice case and that (2.34) holds 
without the nonlattice assumption. Whereas p n = EQ*p n can be analyzed 

by using Chan and Lai's [5] truncation argument to handle l/r(Xn ,0^) in 

(2.15) [see (3.34)], the analysis of Eq*^ involves 1/r 2 ( Xn Op) which does 
not relate to the fmiteness of eigenmeasures via the truncation argument. For 
the special case d = 1 and g([i) = (i, the proof of Theorem 6 uses regeneration 
and Theorem 3 to circumvent this difficulty. Note that in this special case, 
the exponential tilting involves a single 9b instead of a mixture of 9^s. We 
can use geometric integration over a suitably chosen tubular neighborhood 
of the manifold M as in the proof of Theorem 2 of [4] to piece together the 
conclusions of Theorem 6 for the local tiltings. □ 

Proof of Corollary 4. Let Q n = Y^ueG^^nP^n- Arguments similar 
to those in (4.6) can be used to show 



(A.20) EQ n [L 2 n l{ g ( Sn /n)>b}\ < ^mA-^ 

fJ.£G 



( dFn \\ 



Noting that {S n /n £ H(fi)} = {9'^S n — nif)^^) > n(j)(p)}, it follows from the 
proof of Theorem 6 [in particular from the multidimensional versions of 
(A.9) and (A.ll)] that 



(A.21) E„ 



dP n 
dP 



2 

1 {S n /n&H( fJ ,)} 



0(n- 1/2 e- 2n ^ } ), neG. 



By (2.34) with q = and with c replaced by the more general c/r, p n is of 
the order n~ l / 2 e~ bn / T , and therefore Q n is asymptotically optimal in view 
of (4.8), (A.20) and (A.21). □ 
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